17.4 Extrinsic Methods

259

In summary, the central themes of sequence comparison are; 18 distance functions

appropriate in the absence of natural correspondence of elements; optimum corre-

spondences between sequences; and dynamic programming algorithms (Sect. 17.4.4)

for calculating the distances and optimum correspondences.

17.4.3

Trace, Alignment, and Listing

These are, perhaps, the three most important modes of presentation for the analysis

of differences between sequences. Trace consists of the source sequence above and

the target sequence below, with lines, at most one per element and not crossing each

other, from some elements in the source to some in the target. The lines provide

at least a partial correspondence between source and target. There are two kinds of

matches of a pair: if the connected elements are the same, they are referred to as

an identity or a continuation; if they are different, a substitution. A source element

without a line is referred to as a deletion; a target element as an insertion (the term

indel means either an insertion or a deletion). This is illustrated below.

Problem. Construct as many different analyses as possible of the above pair of

sequences using trace.

An alignment or matching consists of, again, the source sequence above and the

target below, forming a two-row matrix. Both rows can be interspersed with null

characters (represented by normal empty set, or minus, or simply a blank)—note that a column of null

characters is not permitted. Deletion has the null character below; a column with

the null character above is a substitution. The absence of normal empty setdenotes a match; if the

elements are equal it is a continuation, if unequal a substitution:

upper I left parenthesis s Subscript a Baseline comma s Subscript b Baseline right parenthesis equals upper I left parenthesis s Subscript b Baseline comma s Subscript a Baseline right parenthesis equals upper I left parenthesis s Subscript a Baseline right parenthesis minus upper I left parenthesis s Subscript a Baseline vertical bar s Subscript b Baseline right parenthesis equals upper I left parenthesis s Subscript b Baseline right parenthesis minus upper I left parenthesis s Subscript b Baseline vertical bar s Subscript a Baseline right parenthesis period

[ I

N

D

U

S

T

R

Y

I

N

T

E

R

E

S

T

]

Problem. Construct as many different analyses as possible of the above pair of

sequences using alignment.

18 Kruskal (1964), Chap. 1 of Sankoff and Kruskal (1999).